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ABSTRACT. The role of parallel processing in heuristic search is examined by means of an example 
(cryptarithractic addition). A problem solver is constnicted that combines the metaphors of constraint 
propagation and hypothesize-and-test. The system is capable of working on many incompatible hypotlieses at 
one time. Furthermore, it is capable of allocating different amounts of processing power to mnning activities 
and changing these allocations as computation proceeds. It is empirically found that the parallel algorithm is, 
on the average, more efficient tlian a corresponding sequential one. Implications of tliis for problem solving 
in general are discussed. 
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1. Introduction 

Many AI systems that perform a "heuristic search" (i.e. they can be thought of as searching some space of 
possibilities for an answer) are based upon one or both of two programming techniques known as constraint 
propagation and hypothesize- and- test. 

In a system based on constraint propagation, internal data stnictures represent (implicitly or explicitly) 
potentially acceptable points in the search space. Computation proceeds in narrowing down these 
possibilities by employing knowledge of the domain in the structure of the computation. There is not enough 
space here to properly introduce the concepts involved in constraint propagation. The reader is referred to 
some systems described in the literature [1, 10] for an introduction. One point we wish to emphasize about 
pure constraint propagation is that at any time the internal data structures will be consistent with any solutioii 
to the problem. Thus, if more than one solution is possible, pure propagation of constraints will be unable to 
select only one of them. Furtlier, even if a unique solution exists, a constraint propagation system may not be 
able to find it. 

The hypothesize-and-test metliodology allows the program to make assumptions that narrow the size of the 
search space; there is no guarantee that the assumption is consistent with any solution to the problem. The 
program continues to make hypotheses until a solution is located or it has been determined tliat no solution is 
possible with tlie current set of assumptions. There is no requirement that any hypothesis be correct and so 
mechanisms must be available that prevent commitment to any hypothesis until it has been demonstrated to 
be acceptable. The most commonly available mechanism is known as backtracking. Backtracking allows the 
program to return to an environment that would exist had that assumption not been made. 

As long as the search space is enumerable (a very weak assumption) hypothesize-and-test can be easily seen to 
be logically more powerful. If there are several consistent solutions, a pure constraint propagation system has 
no way to establish preference for one of tliem. Even if only one solution is possible a constraint propagation 
system will not necessarily find it; this will be demonstrated later by example. The proponents of constraint 
propagation point out that hypothesize-and-test is grossly inefficient in situations where constraint 
propagation can function (sec for example Waltz [12]). I'he example in tliis paper bears out this claim, though 
f^^ one recent study [3] suggests there are situations in which pure backtracking is more efficient tlian constraint 

propagation. 
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One can, however, imagine a composite system that has aspects of both constraint propagation and 
hypothesize-and-test. In such a system, constraint propagation can be used to prune the search space, yet 
allowing hypothesize-and-test to continue the search where constraint propagation is not able to. A constraint 
language tliat can support the creation of such systems has been constructed by Steele [II]. Steele allows 
assumptions to be made and backtracking performed. The current work discusses another such system in 
which the hypothesize-and-test methodology allows more than one assumption to be pursued concurrently. It 
is an extension of earlier work discussing parallel problem solving systems [6, 7] and a language, Ether, for 
implementing these systems. Here we examine one particular kind of search problem, cryptarithmetic 
addition, of the sort used by Newell and Simon [9]. We study this problem, not because it is interesting in 
itself, but because it is well-defined and test cases are relatively easy to come by. This allows us to test the 
efiFicicncies of algorithms empirically. We have constructed a parallel problem solver for doing these 
cryptiirithmetic problems. 

There are two main points we wish to make: 

1. That a system combining both constraint propagation and parallel hypothesize-and-test methodologies can 
be constructed. The code is simple to read, write, and understand. Example code is presented. 

2. That, on the average, a parallel program for solving these puzzles can be constructed that requires less 
average mn time when the parallel program is executed by lime-slicing on a single processor than a sequential 
program executed on the same processor. Obviously, it matters which sequential and which parallel program 
we compare; the benchmarks for this comparison will be explained later and are, I think, quite reasonable. 
The speedup we are talking about here is not large, but is noticeable. The important point is that it is present 
at all. A similar effect has been noticed in other studies for various problems [5,7]. It suggests that 
concurrency may be a useful for the design of heuristic search algoriUims whether or not the programs are 
executed on concurrent hardware or a conventional sequential computer. 

The remainder of this paper consists of a discussion of the problem being solved and die nature of tlic parallel 
solution. We show how the efficiency of the parallel program depends on the use of heuristic information for 
allocating resources of the parallel program. We dica develop a series of allocation strategies, each one 
improving on tlic previous one. We finally discuss the importance of this experiment for a general theory of 
problem solving. We show how die allocation strategics represent a use of what has been called mela-level 



/^ knowledge in the literature, i.e. knowledge about how to guide the search process to gain efficiency. In this 

study, concurrency is necessary to make use of this meta-level knowledge. 

2. The problem 

We are given three strings of letters, e.g. "DONALD", "GERALD", and "ROBERT" diat represent integers when 
substitutions of digits are made for each of die letters. There is at least one possible assignment of digits for 
letters so that die numbers represented by the first two ("DONALD" and "GERALD"), when added, yield the 
number represented by tlie third ("ROBERT"). Any one of Oiese assignments is a solution. In the problems 
we will be looking at, each will contain exacdy ten letters. A solution consists of a mapping from these ten 
letters onto die ten digits through 9. 



3. A Constraint Propagation Solution 

^^ In our construcdon of die constraint network we will use die actor model of computation. We find it a very 

natural formalism for building diese sorts of systems. In diis formalism nodes of die network are 
implemented as actors. Constraint propagation between nodes is implemented by sending of messages 
containing tiie new constraints to die node being constrained. For our cryptaridimetic problem solver we 
have three kinds of nodes: letters, digits, and columns. They are arranged as shown: 
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Arcs in the diagram indicate flow of constraints. Thus column nodes can constrain their left and right 
neighbor columns and certain letter nodes (the ones representing letters contained within the column). Letter 
nodes can constrain digit nodes and column nodes that contain their respective letters. Digit nodes can 
constrain letter nodes. In the initial configuration, before constraint propagation begins, we store at each 
letter node a list of possible digits that contains all ten digits. Similarly, each digit node contains a "possible 
letters list" containing all ten letters. We will give a short description of what each node has to do when it 
receives a message informing it of a new constraint. 

Columns. A column can receive messages informing it of new constraints on letters it contains and on 
possible values for its carry-in and carry-out. If a column node receives any such messages, it computes 
possible new constraints on its letters, carry-in, and can-y-out. If any one of these has no possible values a 
CONTRADICTION is asserted. When a CONTRADICTION is asserted the code implementing 
hypothesize-and-test is invoked to take an appropriate action. New constraints on letters are sent to the 
respective letter nodes. New constraints on carry-in and carry-out are sent to the right and left neighbor 
columns respectively. 

Letters. Letters receive messages that indicate subsets of die digits through 9 that they can possibly be. If 
dicy learn of digits diat they cannot be, nodes representing those digits are sent messages. Also, each column 
that contains the letter receives a new message informing it of the new restrictions on the value of the 
particular letter. If the set of possible digits becomes null, a CONT RAD I CT I ON is asserted. 

Digits. These receive messages from letter nodes indicating that they are or are not the respective letter. If 
the set of possible letters is reduced to a singleton, a message is sent to the particular letter. If the set of 
possible letters is reduced to null, a CONT RAD I CT I ON is asserted. 

We can observe some things about the ability of this system to satisfactorily derive a unique solution. First, if 
there is more than one possible solution it will not find any of them. Since the letter and digit assignments of 
each possible solution are certainly possible assignments, they will appear on the possibility lists attached to 
cadi node. A fact that is not so easy to check by inspection, but which is easily demonstrable empirically, is 
that even if there is only one possible solution (or no possible solutions) the system may not find it (or 
discover that no solutions exist). Nevertheless, the knowledge can be said to be "present" in the network; if 
tlic nodes of the network arc instantiated with an assignment of Icters to digits, the network will assert a 
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CONTRADICTION iff the assignment is not a solution. Our constraint network, tlien, needs the ability to 
make assumptions and test them if it is to be able to solve these puzzles. 



4. Hypothesize and Test in Ether 

The constraint network and hypothesize-and-test methodologies were written in tlie Edier language [6, 7]. We 
will only give enough details about the implementation to support tlie ensuing discussion. The interested 
reader is referred to [8] for a more detailed discussion of the implementation. 

The primitive operations of the Ether languages are based around the notion of an assertion rather dian 
message passing. Rather than coding in a message passing formalism "Send die node for the letter D that is 
5" we instead say "Assert that D is 5" and a process of compilation turns this assertional code into a message 
passing implementation. For certain problems this process of compilation- is important because certain ideas 
can be expressed quite naturally in the assertional form that compile into very complex message passing code. 
These issues will be discussed in [8]. 

Because we are interested in the possibility of pursuing more than one instantiation of the constraint network 
in parallel, we need the ability to have more than one available for processing. For this we introduce the 
notion of a viewpoint. Each viewpoint tags a mutually compatible collection of assumptions about the 
possible values pf letters and digits together with the constraints that derive via propagation from diese 
assumptions, (i.e. a viewpoint is one pardcular instantiation of the network). Viewpoints are related to each 
other by an inheritance mechanism. The viewpoint in which A is assumed to be 5 and B is assumed to be 4 
might be a subviewpoint of tlie one in which A is assumed 5 and no other assumptions have been made. 
Viewpoints are the repositories of assumptions and facts derived from these assumptions. 

In order to be able to hypoUiesize and test we need to introduce some control primitives. These primitives are 
built around a constnict known as an activity. All processing that happens during execution happens under 
tlie auspices of some activity. There are language constnicts for conveniently grouping parts of a related task 
into a single activity. For example, we can create an activity, make a new assumption in a viewpoint, and 
cause all furdier work within the viewpoint (i.e. al! further constraint passing in die instance of the network 
f^ donned by the assuniplion) to be part of the activity. 



Activities are of interest because they give us ways to control quantities of system resources available for the 
execution of alternative explorations. If we stifle an activity, all execution with the activity stops; a stifled 
activity cannot be restarted. We also have the ability to control the rates that non-stifled activities run. 
Different activities can be assigned different amounts of processing power, the total amount of CPU time an 
activity will get during an interval of time is proportional to its processing power. The processing power of an 
activity can be altered by the system asynchronously with the nmning of the activity. 

Systems using hypothesize-and-test can be constructed in Etlier by using viewpoints to represent assumptions 
made, and activities to control which parts of die search space are explored, and with what vigor. 

5. A Simple Parallel Solution 

In this treatment we will ignore many details of how both the Etiier system and the cryptarithmetic system 
implemented within it are constructed. If we wish to "create a new instance of the constraint network" that 
inlierits from another, we create a new viewpoint (using the new-viewpoint construct). To add an assertion 
about a letter being associated with a digit within the context of this viewpoint, we execute 
(assert (one-of -better (-»digit))) where letter and digit are bound to the respective letter and digit 
which we want to assume are identified in this viewpoint. The second argument to one-of is a list of possible 
digits that die letter can be. So, for example, we could execute (assert (one-of s (i 3 5 7 9))) to indicate 
tliat S is odd. Ether syntax makes use of a quasi- quote com anixon in which symbols prefaced by the character 
"^" are substituted with die values of the associated symbols. If letter were bound to "D" and digit were 
bound to "5", the item actually asserted would be (one-of D (5)). If the assert is executed within die 
context of a certain activity, dien all work propagating constraints diat follow from diat asserdon will happen 
within that acdvity. 

The implementation described in diis section is quite simple. It first creates a viewpoint in which no 
assumptions arc made and continues propagating constraints within this viewpoint until it has quiesced, i.e. no 
more propagation can happen. When Uiis state has been achieved, if each letter does not have a unique digit 
tliat it can be idcndfied with, it is determined which letter has the least number of possible digits Uiat it can be 
(excluding those letters that already have a unique assignment). For each one of these digits, a new viewpoint 
and a new activity are created. Within these (in parallel), the letter is asserted (assumed) to be the digit and 
propagation of constraints continues. If quiescence is reached in tliis new activity and the problem has not 



ji««iij been solved, we recurse. 
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The function shown below takes a letter, a list of alternative digits, and a viewpoint. It uses the environment 
contained in the viewpoint to create new subviewpoints in which the letter is assumed to be each of the 
alternative digits. We first check to see if there is at least one possible digit. If not, there cannot be a possible 
solution to this problem consistent with the parent viewpoint and so we assert that there is a contradiction 
within the parent viewpoint. Otherwise we iterate over each digit in the alternatives list and for each one we 
create a new viewpoint whose parent is the parent viewpoint and a new activity with parent start-act and 
assert tlie letter is tlie particular digit; this initiates propagation of constraints. If we discover there is a 
contradiction within the viewpoint (this is accomplished by the code fragment beginning with 
"(when {(contradiction)}") we assert within the parent viewpoint that the letter cannot be the particular 
digit. We are justified in doing this because the only difference, in terms of assumptions made, between th? 
current viewpoint and the parent viewpoint is the one assumption of the letter being identified with a 
particular digit that was a possible alternative in the parent viewpoint; if this assumption leads to a 
contradiction, we know tliat this is not a possible identification for tlie letter. In addition we stifle (stop from 
executing) the activity that was pursuing the now known to be invalid assumption. We ftirther check to see if 
the activity quiesces in the section of code beginning with "(when {(quiescent -»a)}". If this has occurred, 
wc first check to see if the problem has been solved. If so we are done; otherwise we determine the letter in 
the viewpoint with the least number ofpossible digits (but greater than 1) and recursively call parallel -solve 
on this. 
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7. An Experiment 

In order to test for the existence of a speed-up with concurrency we timed 10 problems using the final parallel 
algorithm described above for several concurrency factors. The problems tested are: ■ 

1) DONALD + GERALD = ROBERT 

2) CRIME + TRIAL = THIEF 

3) POTATO + TOMATO = VEGIES 

4) MIGHT + RIGHT = MONEY 

5) FUNNY + CLOWN = SHOWS 

6) FEVER + CHILL = SLEEP 

7) SHOVEL + TROWEL = WORKER 

8) TRAVEL + NATIVE = SAVAGE 

9) RIVER + WATER = SHIPS 

10) LONGER + LARGER = MIDDLE 

They were picked by a trial-and-error process of selecting possible problems and tlien running them to see if 
they have a solution. It is not known whether they have one or more than one solution. The program finishes 
when it has found one solution. These tests were nm on the MIT Lisp machine, a single user machine 
designed for efficient execution of Lisp programs. The times represent processor run time only and are 
adjusted for time lost due to paging. The manager activity, which continually monitors the state of the search 
activities and readjusts processing power accordingly, receives a processing power allocation of .1. We tested 
with concurrency factors between 1 and 7. Numbers 2 through 7 each gave some improvement with 4 being 
the best. Here we report the results for concurrency factors 1 and 4. Times reported are in seconds: 
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2.24 


i ■ 


total : 


6952 


2519 


2.76 



With a concurrency factor of 1 the algorithm becomes, fijnctionally, a depth-first search. A concurrency 
factor of 4 represents the value which yields least average run time for the problems examined. Concurrency 
factors larger and smaller yield higher average values. We caution the reader not to take the numbers too 
seriously. We only wish to demonstrate that the parallel algorithm runs with some improvement of efficiency 
over the sequential algorithm. 

Some interesting facts can be learned by examining tlie data. Although the parallel solution beat out the 
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inconsistent. Asserting that it is inconsistent will trigger the activity monitoring the next higher viewpoint to 
pick the next possibility on its list. Depth-first is a degenerate case of parallel search in which only one 
activity at a time is given non-zero processing power. 

6.1 Using Heuristic Information to Control Resource Allocation 

A simple elaboration we can make to the parallel implementation presented that preserves its parallel 
character is to vary the processing power based on an assessment of how likely the assumptions we have made 
within its associated viewpoint are to lead to useful information (either leading to a solution or determining 
that tlie viewpoint is contradictory). We base the quantity of processing power allocated to the activity doing 
the exploration on the numerical value of this judgement. For this particular problem, we are more likely to 
learn in a short period of time whether a viewpoint contains a valid solution or is contradictory if it is already 
fairly well constrained, i.e. if the letters in the viewpoint only have a few possible digits that they could be. 
After some experimentation we came upon the following formula for determining relative processing power 
allocations for the various different activities participating in the search: 

((10-ni)2 + ... + (10-ni0)2)2 
where each nj is the number of possible digit assignments for the letter i in tlie viewpoint. If the letters tend to 
have fewer possible digit possibilities, the sum terms (10 - n^) will tend to be large. Squaring this number, and 
squaring the final sum serves to accentuate the relative differences between the different viewpoints. When 
the system is first set up, a separate activity known as die manager activity continually monitors each of the 
other nmning activities and evaluates this function for each associated viewpoint. The processing power 
allocations to these activities are adjusted in proportion to the numerical value of this formula. The Ether 
command we use for modifying die processing power allocations of an activity is called support-in-ratios. 
It takes three arguments: an activity, a list of activities (that are children of the first) and a list of non-negative 
numbers with the same number of elements as die list of activities, "^rhe processing power assigned to the 
parent activity is (re)divided among the children activities in proportion to die numbers in diis list. Thus, if a 
factor for a given activity is die acUvity gets no processing power; if the fi\ctor associated with the activity is 
twice the factor associated with another, dien die former activity gets twice as much processing power as the 
latter. The allocator described is implemented as follows: 
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(defunc square-both-allocator {) 
(support-in-ratios 
parent start-act 

activities currently-exploring-activi ties 
factors (fori i St 
vpt 

currently-explored- viewpoints 

(let {(status (quiescent-letter-constraints vpt)) 
(sum 1)) 
(foreach 
pair 
status 

(increment sum (expt (- 10. (length (cadr pair))) 2))) 
(max (expt sum 2) 1))))) 

We create a separate activity at top-level called the manager-activity and execute the following to have the 

allocation strategy continually called asynchronously with the activities doing the actual search: 

(wi thin-activity manager-activity 
(continuously-execute 

(funcall #' square-both-allocator))) 

The manager-activity is given a processing power of .1 (meaning it will use, on the average, a tenth of the 

total CPU time for the entire run). 

This scheme gives considerably better performance than the simple parallel solution. It does better than the 
backtracking solution on some examples with a single processor implementation, aldiough on die average the 
backtracking solution is more efficient. It is important to understand die source of diis improvement. We 
have a scheme for estimating the likelihood diat a running activity will return useful information in a short 
period of time. We allocate more resources to Uiose activities diat we estimate will supply us with information 
for die least amount of resource expenditure. Assuming our heuristic is reasonable, die average time to 
complete the search is reduced. 

There are diree more improvements we have made to die processing power allocation strategy before 
reaching the final strategy for which we have collected data in die next section. Each will be described in 
turn. 



6.2 Concurrency Factors 

We have observed in die allocation strategy discussed thus far diat even diough activities are running widi 
different amounts of processing power diat arc related to our estimate of the utility of getting useful 
information back from them, there still seems to be so many activities nmning diat diey tend to thrash against 
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one another. We would like to limit the amount of concurrency so that the running activities can get 
something done. For this purpose we introduce the notion of a concurrency factor Instead of letting all 
runnable activities run, we pick the n most promising activities (using the metric above), where n is the 
concurrency factor, and give only those activities processing power and in the ratios defined by the metric. 
The optimal value for the concurrency factor is picked experimentally and is discussed below. 

The value of the concurrency factor that yields tlie best result is a reflection of two aspects of tlie problem: the 
quality of our heuristic knowledge and the distribution of computational expense for picking bad branches in 
the search. Obviously if our heuristic knowledge were perfect, i.e. it could always point to the correct branch 
to explore next, the optimal concurrency factor would be 1 -- it should simply explore this best branch. If we 
are less sure we are about which is the best, more branches should be explored. Also, if the computational 
cost of exploring a bad branch is always small, a small concurrency factor would be appropriate. If, however, 
the cost of a bad branch can be very large we would want to use a larger concurrency factor. With a small 
concurrency factor we increase the probability tliat the problem solver will become stuck for a very long time. 
A limiting case of tiiis is with a search space Uiat is infinite (introducing the possibility of a bad branch that 
never nms out of possibilities) and a concurrency factor of 1. If tiic problem solver happens to pick one of 
these branches it will diverge. 

Hayes-Roth has noted an analogy with portfolio theory, the purpose of which is to pick an investment strategy 
that will yield the greatest expected capital appreciation. Uncertainty about Uie future perfonnance of certain 
industries and volatility in the market place argue for greater diversification of the portfolio. 

6.3 Estimating Which Assumptions Are Most Valuable 

Our strategy so far has been to use hypothesize^and-test on one letter only in each viewpoint. We sprout one 
new viewpoint and activity to test the hypothesis tliat that letter is each one of tiie digits it could possibly be in 
the parent viewpoint. This is not necessarily die best strategy. By hypotiicsizing a letter is a certain digit we 
may learn a lot or a litUe. We have "learned a lot" if we (1) discover quickly that a viewpoint is contradictory, 
or (2) cause a lot of constraint propagation activity that significantiy increases our evaluation of the new 
viewpoint. One thing wc have observed is that tlic amount we learn from assuming a letter is a particular digit 
docs not signijlcanlly depend on which digit we use. In other words, if we assume Uie letter N is 2 and discover 
a contradiction, tlicn wc are likely to eiUier discover a contradiction or signficanUy constrain our solution by 
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assuming N is any other digit on its list of alternatives. To take advantage of this phenomenon the program 
remembers what happened when it malces particular assumptions. When it creates a new viewpoint to study 
tlie result of assuming a letter is a particular digit the result is recorded in the parent viewpoint when it has 
completed. There are two possible results. If it led to a contradiction this fact is recorded. If it led to a 
quiescent (but consistent) state it records the difference of die evaluation metric applied to the parent 
viewpoint and the evaluation metric on the quiescent viewpoint -- our estimate of tlie amount of reduction 
that is likely to be obtained by assuming tliis letter to be a digit. Our new evaluation metric attempts to take 
tills information into consideration. When assuming a letter L is a specific digit we use the old evaluation 
metric if we do not have have never assumed L to be a particular digit from this viewpoint; otherwise, we use 
the average of the evaluations for each of the resultant viewpoints. We then multiply tliis figure by the factor 
1 + .5 * n where n is the number of letters that we have assumed L to be and determined that they lead to 
contradictions. 

Now that we have a mechanism for taking advantage of information learned by making different assumptions 
we would like to ensure that a variety of choices are tried at eacJi branching point. We will slightly modify the 
technique for picking the activities to be run at any given time (in accordance with the concurrency factor). 
Where c is the concurrency factor, we use the following algorithm to pick the c activities to run at a given . 
time: 

1. The activity with tlie highest evaluation is scheduled. 

2. If n < c activities have been selected for running, the n + 1st activity is (a) the one with the highest metric if 
it does not duplicate any of the first n activities in terms of which letter it is making an assumption about for a 
given viewpoint, or (b) the highest rated non-duplicated activity unless the highest rated activity has a rating 
at least three times higher in which case we use the highest rated activity. The factor three was picked 
experimentally and is based on tlie following argument. There is a certain advantage in having a diversity of 
letters being tested because this gives us a greater chance to discover assumptions that will cause significant 
shrinkage by constraint propagation. However, there is also an advantage to running the activity tliat we have 
estimated will give us the best result. The factor tlirec is the ratio of estimates for expected gain for which we 
would rather run the higher estimated test tiian one that will increase our diversity. 
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7. An Experiment 

In order to test for the existence of a speed-up with concurrency we timed 10 problems using the final parallel 
algorithm described above for several concurrency factors. The problems tested are: ■ 

1) DONALD + GERALD = ROBERT 

2) CRIME + TRIAL = THIEF 

3) POTATO + TOMATO = VEGIES 

4) MIGHT + RIGHT = MONEY 

5) FUNNY + CLOWN = SHOWS 

6) FEVER + CHILL = SLEEP 

7) SHOVEL + TROWEL = WORKER 

8) TRAVEL + NATIVE = SAVAGE 

9) RIVER + WATER = SHIPS 

10) LONGER + LARGER = MIDDLE 

They were picked by a trial-and-error process of selecting possible problems and tlien running them to see if 
they have a solution. It is not known whether they have one or more than one solution. The program finishes 
when it has found one solution. These tests were nm on the MIT Lisp machine, a single user machine 
designed for efficient execution of Lisp programs. The times represent processor run time only and are 
adjusted for time lost due to paging. The manager activity, which continually monitors the state of the search 
activities and readjusts processing power accordingly, receives a processing power allocation of .1. We tested 
with concurrency factors between 1 and 7. Numbers 2 through 7 each gave some improvement with 4 being 
the best. Here we report the results for concurrency factors 1 and 4. Times reported are in seconds: 







concur- 


concur- 


rati 






rency 


rency 








factor 


factor 








= 1 


= 4 






1) 


377 


140 


2.69 




2) 


85 


153 


.66 




3) 


167 


192 


.87 




4) 


79 


246 


.32 




5) 


663 


227 


2.92 




6) 


.2868 


348 


8.24 




7) 


241 


112 


2.15 




8) 


78 


335 


.23 




9) 


1920 


554 


2.55 




10) 


474 


212 


2.24 


i ■ 


total : 


6952 


2519 


2.76 



With a concurrency factor of 1 the algorithm becomes, fijnctionally, a depth-first search. A concurrency 
factor of 4 represents the value which yields least average run time for the problems examined. Concurrency 
factors larger and smaller yield higher average values. We caution the reader not to take the numbers too 
seriously. We only wish to demonstrate that the parallel algorithm runs with some improvement of efficiency 
over the sequential algorithm. 

Some interesting facts can be learned by examining tlie data. Although the parallel solution beat out the 
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sequential solution in only 6 of the 10 cases, these six cases are the ones for which the sequential solutions take 
the longest. In particular, problems 6 and 9 have show by far the longest times for the sequential solution and 
the time saving of the parallel solution is considerable. Similarly, for the cases in which the sequential 
solution finished quickly, the parallel solution tended to take longer. This phenomenon is fairly easy to 
explain. The parallel solution supplies "insurance" against picking bad branches in the search space. If the 
sequential solution happened to pick a bad branch (or several bad branches) there was no recourse but to 
follow it through. Similarly, if tlie sequential program found a relatively quick path to the solution, the extra 
efficiency of the parallel solution was not needed. 

8. Conclusions 

We have demonstrated that cryptarithmetic puzzles can be solved with a certain increase in average efficiency 
by the parallel algoritlim described over a more traditional depth-first search solution. While this result in 
and of itself is of little use it does demonstrate a tool tliat may be of great use in heuristic programming -- the 
use of parallelism to control a heuristic search. Several writers have pointed to the use of meta-level 
knowledge (e.g. Davis [2]) in controlling a search. Meta-level knowledge is knowledge about how to use the 
problem solving tools at hand in a way that increases overall search efficiency. The allocation strategies we 
have examined are meta-level knowledge for cryptaritlnnetic problems. By allowing a few to run in parallel, 
and with controllable amounts of processing power we are able to increase the efficiency of the search. 
Although the increase we gained is not dramatic dicre is reason to suspect that it would be more significant in 
more interesting problems. The size of the search space in these problems is relatively quite small. Thus 
picking a "bad branch" in the search can't be too catastrophic. With a search space that is much larger, and 
possibly infinite (as is the case with many interesting problems), a bad branch using a parallel search can only 
do a bounded amount of harm, bounded by die quantity of processing power allocated to it. 

We introduced several concepts that were used in the construction of tlie allocation strategy. Processing 
power is allocated in proportion to an estimate of how likely we are to get uscfijl information out of the 
exploration of a branch. Concurrency factors have been introduced to keep the problem solver reasonably 
focused. A certain amount of diversity is incorporated in the algorithm to increased tlie likelihood of 
discovering assumptions that can be made tliat will lead to valuable information quickly. Although tiie only 
problem wc have examined is cryptarithmetic, there is nothing about these general strategies that is specific to 
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^*S cryptarithmctic. They contribute to a general theory of parallel problem solving. 

The form of the code is quite simple to write and understand. The algorithm consists of a mixture of 
constraint propagation and parallel hypothesize-and-tcst. The programs involve asynchronous, concurrent 
activities processing different sets of assumptions. Furthermore, the resources allocated to these activities can 
be altered asynchronously with the execution of the activities. 

We have demonstrated that introducing concurrency in the search process does actually increase overall 
efficiency, in particular it does no hann. lliis lends support to efforts to design a computing system for 
message passing languages tliat involves many intercommunicating autonomous processors (e.g. Hewitt [4]). 
It suggests there is inherent concurrency in search problems diat could be gainfully run on multiple 
processors. 
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